43 research outputs found
Capturing Spatially Varying Anisotropic Reflectance Parameters using Fourier Analysis
International audienceReflectance parameters condition the appearance of objects in photorealistic rendering. Practical acquisition of reflectance parameters is still a difficult problem. Even more so for spatially varying or anisotropic materials, which increase the number of samples required. In this paper, we present an algorithm for acquisition of spatially varying anisotropic materials, sampling only a small number of directions. Our algorithm uses Fourier analysis to extract the material parameters from a sub-sampled signal. We are able to extract diffuse and specular reflectance, direction of anisotropy, surface normal and reflectance parameters from as little as 20 sample directions. Our system makes no assumption about the stationarity or regularity of the materials, and can recover anisotropic effects at the pixel level
Blur Interpolation Transformer for Real-World Motion from Blur
This paper studies the challenging problem of recovering motion from blur,
also known as joint deblurring and interpolation or blur temporal
super-resolution. The remaining challenges are twofold: 1) the current methods
still leave considerable room for improvement in terms of visual quality even
on the synthetic dataset, and 2) poor generalization to real-world data. To
this end, we propose a blur interpolation transformer (BiT) to effectively
unravel the underlying temporal correlation encoded in blur. Based on
multi-scale residual Swin transformer blocks, we introduce dual-end temporal
supervision and temporally symmetric ensembling strategies to generate
effective features for time-varying motion rendering. In addition, we design a
hybrid camera system to collect the first real-world dataset of one-to-many
blur-sharp video pairs. Experimental results show that BiT has a significant
gain over the state-of-the-art methods on the public dataset Adobe240. Besides,
the proposed real-world dataset effectively helps the model generalize well to
real blurry scenarios
Pathological Evidence Exploration in Deep Retinal Image Diagnosis
Though deep learning has shown successful performance in classifying the
label and severity stage of certain disease, most of them give few evidence on
how to make prediction. Here, we propose to exploit the interpretability of
deep learning application in medical diagnosis. Inspired by Koch's Postulates,
a well-known strategy in medical research to identify the property of pathogen,
we define a pathological descriptor that can be extracted from the activated
neurons of a diabetic retinopathy detector. To visualize the symptom and
feature encoded in this descriptor, we propose a GAN based method to synthesize
pathological retinal image given the descriptor and a binary vessel
segmentation. Besides, with this descriptor, we can arbitrarily manipulate the
position and quantity of lesions. As verified by a panel of 5 licensed
ophthalmologists, our synthesized images carry the symptoms that are directly
related to diabetic retinopathy diagnosis. The panel survey also shows that our
generated images is both qualitatively and quantitatively superior to existing
methods.Comment: to appear in AAAI (2019). The first two authors contributed equally
to the paper. Corresponding Author: Feng L
ClipCrop: Conditioned Cropping Driven by Vision-Language Model
Image cropping has progressed tremendously under the data-driven paradigm.
However, current approaches do not account for the intentions of the user,
which is an issue especially when the composition of the input image is
complex. Moreover, labeling of cropping data is costly and hence the amount of
data is limited, leading to poor generalization performance of current
algorithms in the wild. In this work, we take advantage of vision-language
models as a foundation for creating robust and user-intentional cropping
algorithms. By adapting a transformer decoder with a pre-trained CLIP-based
detection model, OWL-ViT, we develop a method to perform cropping with a text
or image query that reflects the user's intention as guidance. In addition, our
pipeline design allows the model to learn text-conditioned aesthetic cropping
with a small cropping dataset, while inheriting the open-vocabulary ability
acquired from millions of text-image pairs. We validate our model through
extensive experiments on existing datasets as well as a new cropping test set
we compiled that is characterized by content ambiguity